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Abstract: Many problems in early vision can be formulated in terms of minimizing an 
energy or cost function. Examples are shape-from-shading, edge detection, motion analysis, 
structure from motion and surface interpolation (Poggio, Torre and Koch, 1985). It has been 
shown that all quadratic variational problems, an important subset of early vision tasks, can 
be "solved" by linear, analog electrical or chemical networks (Poggio and Koch, 1985). 
In a variety of situations the cost function is non-quadratic, however, for instance in the 
presence of discontinuities. The use of non-quadratic cost functions raises the question of 
designing efficient algorithms for computing the optimal solution. Recently. Hopfield and 
Tank (1985) have shown that networks of nonlinear analog "neurons” can be effective in 
computing the solution of optimization problems. In this paper, we show how these networks 
can be generalized to solve the non-convex energy functionals of early vision. We illustrate 
this approach by implementing a specific network solving the problem of reconstructing 
a smooth surface while preserving its discontinuities from sparsely sampled data (Geman 
and Geman, 1984; Marroquin, 1984; Terzopoulos, 1984). These results suggest a novel 
computational strategy for solving such problems for both biological and artificial vision 
systems. 
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1. Introduction 


The study undertaken in this manuscript has its origin in two different areas: computational 
vision and neuronal networks. Problems in early vision, such as computing depth from two 
stereoscopic images, reconstructing and smoothing images from sparsely sampled data, 
computing motion etc, are inherently difficult to solve, although to us they seem effortless. 
Within the last years, computational studies have provided promising — although far from 
complete — theories of the computations necessary for early vision (for partial reviews see 
Marr, 1982; Brady, 1982; Ballard, Hinton and Sejnowski, 1983; Poggio, Torre and Koch, 
1985). Early vision consists of a set of processes that recover physical properties of the 
visible three-dimensional surfaces from the two-dimensional intensity arrays (Marr, 1982). 
A number of these tasks can be described within the framework of standard regularization 
theory (Poggio and Torre, 1984; Poggio et al., 1985): edge detection, smooth surface 
interpolation and computing the smoothest velocity field. Standard regularization analysis 
can be used to solve them in terms of quadratic energy functionals which must be minimized, 
subject to certain constraints. Previous work by Poggio and Koch (1985; see also 1984) 
showed how to design linear, analog networks for solving regularization problems with 
quadratic energy functions. The domain of applicability of standard regularization theory 
is limited, however, by the convexity of the energy functions which makes it impossible to 
deal with problems involving true discontinuities without introducing new concepts (Poggio 
et a!., 1985). Such problems can be described by non-convex energy functions involving 
line processes (Geman and Geman, 1984; Marroquin, 1984; see Blake, 1983). Methods 
proposed for minimizing these include simulated annealing (Kirkpatrick, Gelatt and Vecchi, 
1983; Geman and Geman, 1984) and graduated non-convexity (Blake, 1983). More recently 
Marroquin (1985b) has proposed a different approach, based on the use of Markov Random 
Fields (MRF) models and Bayes estimation theory, in which the solution to early vision 
problems is not expressed in terms of minimizing an energy functional. The resulting 
algorithms, however, can also be implemented in analog and hybrid networks such as the 
ones we describe here. 

There has been considerable interest in recent years in the computational properties and 
capabilities of networks of simple, neuronal-like elements (for instance, Kohonen, 1977; 
Marr and Poggio, 1977; Ullman, 1979; Hopfield, 1982, 1984; Palm, 1982). More recently, 
Hopfield and Tank (1985) have shown that analog neuronal networks can provide fast, 
next-to-optimal solutions to a well characterized, but difficult, optimization problem, the 
Travelling Salesman Problem. In this paper we show that highly interconnected networks 
of simple, analog processing elements can be used to give fast solutions to a number of 
early vision problems. Apart from their intrinsic interest as neuronal models, these networks 
may have important practical applications for computer vision. Their ability to perform 
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computations by analog elements implies fast convergence times with respect to the basic 
cycle time of digital hardware. For many problems in early vision, a fast approximate answer 
is more valuable than a slow correct one. The use of such networks in combination with 
massive parallel computers offers the possibility of real time artificial vision systems. One 
caveat. We do not, of course, equate our impoverished model of a "neuron" with the 
complexities of "real" neurons. Our neuronal networks do, however, share several important 
properties with nerve cells: high connectivity, analog mode of operation and tolerance to 
hardware failures. 


2. Smooth Surface Reconstruction 

Surface reconstruction is a typical problem of early vision that can be formulated in terms 
of minimizing a quadratic energy function (Grimson, 1981, 1982; Terzopoulos, 1983). It 
occurs in several situations. For example, if a stereo algorithm computes depth values 
only at specific locations in the image, for instance along edges (i.e., at zero crossings 
of the convolution of the image with the Laplacian of a Gaussian Operator), the surface 
must be interpolated between these points. Another instance occurs when the data is given 
everywhere but is noisy and needs to be smoothed. 

Grimson (1981, 1982) studied surface interpolation in the context of stereo matching. He 
considered a stereo algorithm (Marr and Poggio, 1979) in which isolated primitive features, 
zero crossings — corresponding to significant events in the images — were matched 
yielding a depth value at the feature points. He then proposed an interpolation scheme to 
obtain depth values throughout the image. A partial justification for the use of interpolation 
schemes in human vision comes from studies of random dot stereograms. Here data is only 
given at isolated dots and yet the perception is of a smooth surface, even when the density 
of dots is very low. Grimson’s interpolation scheme involves minimizing a quadratic energy 
function and can be described as fitting a thin flexible plate through the observed data. 
Both Grimson’s and Terzopoulos’s (1983) interpolation scheme (1983) can be described 
in terms of regularization theory. The energy or cost function E(x) to be minimized — 
subject to certain constraints which derive from a physical analysis of the problem under 
consideration — is given by: 


/^) = || Bx -6|| 2 + X ||5 x || 2 (1) 

where x is the vector representing the image points and B and S are matrices. The first 
term gives the distance of the solution to the data and the second term corresponds to the 
regularizer needed to make the problem well-posed. For surface interpolation, the elements 
of B are equal to 1 at those locations where the depth is known and 0 at all others. 
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The stabilizer S corresponds to the operator associated with a membrane or thin plate, 
depending on the kind of smoothing desired. E can be reformulated as 

E{x) = x T {B T B + \S T S)x - 2 x T Ob + b T b. _ (2) 

This expression can be transformed into 

HY)=\Y t T isViV i + Y. v < 1 * ( 3 ) 

ij i 

by identifying the matrix T with 2 (B T B + \S T S), V { with x, U with -2 Bb and dropping the 
constant term b T b. We can interpret this expression as the Lyapunov function of a neuronal 
network, with linear input-output characteristic. Thus, the "voltage" V { corresponds to the 
output of the processing element i, henceforth termed neuron i, /,• is the current being 
injected into neuron i and r, 3 is the strength of connectivity between neurons i and j. If 
no connection exists between two neurons, the appropriate entry in T tj is set to zero. If 
every neuron has an associated capacitance C it its output Vj will be updated according to: 


' = _£*£ 

*’ dt ~ dV { - 


( 4 ) 


Note that this update corresponds to finding the minimum of a convex function using steepest 
descent. This rule ensures that L always decreases with time and hence corresponds 
to a Lyapunov function. For the case of quadratic regularization principles, this function 
is positive definite quadratic and so the system will always converge (except in some 
pathological cases) to the unique energy mimimum. In other words, every quadratic 
variational principle of the type shown in equation (2) can be solved with an appropriate 
neuronal network, where the connections can be implemented by linear Ohmic resistances 
and the data is given by injecting currents. A similar result, using slightly different circuit 
components, was derived by Poggio and Koch (1985). 


3. Line Processes 


However, quadratic variational principles have limitations. The main problem is the degree 
of smoothness required for the unknown function that is to be recovered. For instance, the 
surface interpolation scheme outlined above smoothes over edge discontinuities and often 
leads to unrealistic results (figure 1). 

Marroquin (1984) has proposed a scheme to overcome this difficulty (see also Blake, 
1983; Terzopoulos, 1985). Following Geman and Geman (1984), he used a probabilistic 
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Figure 1 . Smooth and piecewise smooth surface reconstruction from noisy and sparse data, (a) 
Three dimensional representation of a reconstructed and smoothed surface from sparse observations 
using a quadratic energy expression corresponding to interpolating with a membrane. Depth 
measurements are only available along the ridge in the main plane of the figure and along the rim 
of the "tower" (see arrows). Along these contours, on the average every second point is sampled. 
The observations are assumed to be corrupted by a Gaussian noise (o-,- = 0.25). The state of the 
system is show at 0.5r. (b) Piecewise smooth surface reconstruction from the same data set at time 
r. The image is clearly segmented into two distinct domains. Both surfaces are computed using our 
network with constant coupling. For parameters, see text. 

formulation of the surface reconstruction problem; the behavior of a piecewise smooth 
surface is modeled using two coupled Markovian Random fields (Kindermann and Snell, 
1980): a continuous-valued one that corresponds to the depth, and a binary one, whose 
variables are located at sites between the depth lattice (see figure 2a). The function of 
the unobservable "line process" is to indicate the presence or absence of a discontinuity 
between two adjacent depth elements. Using Bayes theory, it is found that the maximum a 
posteriori estimate of the surface corresponds to the global minimization of the "energy" 
function: 


I'V, l) = £(/< - /;) 8 0 - l<j) + £(/•' - ^) a + £ V c (l), (5) 

I,i »' i 

where the term V c [l) (the "potential" of the line process l) measures the cost that has to 
be paid for the introduction of specific local configurations of lines and embodies the prior 
knowledge about the geometry of the discontinuities (for instance, the fact that they occur 
along piecewise smooth curves that only rarely intersect). 

In one dimension the energy function is given by 
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Figure 2. (a) The two-dimensional lattice of line processes (lines) and depth points (crosses). 

Each depth value is enclosed on all four sides by a line process, (b) The local connections between 
neighouring line processes (filled squares) and the depth lattice (open squares). Notice that the 
quadratic part of the network, i.e. the depth lattice, is only connected to its four neighbours, while 
every line process, except at the boundaries, is connected to 10 neighbouring locations. 


E(f,l) = £(/<+1 - Ii)\ 1 - U) + Q £(/,• - ri,) 2 + e, £ U. (6) 

i t i 

Here the /, correspond to the line process. Observe that if the gradient of / becomes too 
large (i.e. (/ i+1 - /,-) 2 > c t ), it becomes cheaper to break the surface, and put in a line — 
paying the "price" c t — rather than to interpolate smoothly. This line process f, introduces 
local minima into the energy function, making the problem highly non-linear. The term 
(/i-c/,-) 2 , describing the difference between the measured data d { and the approximated 
surface value is weighted by c d , which depends on the signal-to-noise ratio. If <k 
is very reliable, then c d > > 1. For two-dimensional images more terms are required in 
equation (6) (see, equation (11)). In Marroquin’s study (1984), the potentials of the binary 
line process were implemented by a table lookup procedure and the minimization of the 
resulting combinatorial optimization problem was carried out using simulated annealing. 

Here we sketch another method based on Hopfield results (1982, 1984; see also Hopfield 
and Tank, 1985; for more details see Marroquin, 1985a). Hopfield’s idea was to solve 
combinatorial optimization problems by allowing the binary variable to vary continuously 
between () and 1 and to introduce terms in the energy function that forced the final solution 
to one of the corners of the hypercube [0,1] Ar . 

Briefly, let the output variable V, for neuron i have the range 0 < V,- < 1 and be a 
continuous and monotonic increasing function of the internal state variable it,- of the neuron 
i: Vi — <ji («,-). A typical choice is 


5 










Koch, Marroquin and Yuille 


Neuronal Networks 


«= 1 + e _2x„;- 


( 7 ) 


The neurons are highly interconnected. The strength of the connection between i and j is 
given by the matrix element T,j. Furthermore, each neuron has its own input capacitance 
Ci and transmembrane resistance i?,-. The resulting charging equation that determines the 
rate of change of «,• is 



£r,^-g + /t . 


( 8 ) 


where /, can be considered as fixed input to neuron i. Hopfield introduces the quantity 


E = 


- 5 E 7 v^ + 





gT\V)dV + X)/,Vi, 
i 


( 9 ) 


and shows that E is a Lyapunov function of the system, as long as T tJ - is symmetric. In 
other words, using the update of equation (8): 


- W = dE 

’ dt ~ dVi 


( 10 ) 


the time evolution of the system is a motion in state space that seeks out minima in E and 
comes to a stop at such points. The relation between the stable states of the continuous 
model and those of the binary ones, in which the output of every neuron can be either 0 
or 1, is governed by X. For X -► co, g { tends to either 0 or 1, 


In order to be able to use these analog networks for piecewise smooth surface reconstruction, 
we have to map the binary line processes k into continuous variables bounded by 0 and 
1. One possibility for choosing an appropriate continuous energy expression is outlined 
below. The energy function has four contributors: the interpolation term, E,, the data term, 
E D , the line potential term E L (corresponding to the potential V c (l) in (5)) and the gain 
term E c , : 


Vi = 'Eiifw - /<,;)’( 1 - »i,j) + (/.+!,i - fij)*( 1 - hijj) 


(11a) 




^=YE(/^-4y) 2 


(116) 




hi c v E/( w, <i(^ v ') T hi,j{ 1 f'iii)) + c p *y 1 T ^tjf't+i.y) + C c ^ ' { v > J T 
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~ >rC L y v v i,j ' ((1 v i+l,j h i,i hi,j+l) + (1 v i-l,j l,j + l) ) (H c ) 

i,3 

+c/ y 'y ^ h{ t j ■ ((1 — h{j+ 1 — Vij — Vj+i,j) + (1 — h-ij—i — i-’ij- 1 — Vj+ij— i) ) 

id 



9i 


-i 


( v)dv ) + g,- l (/r)d/i) 


(H4 


Here /,-j, v itj and hij correspond to the depth, the vertical line process and the horizontal 
line process respectively. The first term in E L forces the line process to the corners of the 
hypercube, i.e. to either 0 or 1. The second term penalizes the formation of adjacent parallel 
lines while the third term represents the cost that need be paid for the introduction of 
every single line. The fourth term is an interaction term which favours continuous lines and 
penalizes both multiple line intersections and discontinuous line segments. The gain term 
forces the line process inside the hypercube [0, l]^. Figure 2b illustrates the connections 
required to implement this energy function within the line and the depth lattices. Following 
Hopfield (1984), we choose the following update rule: 


dfi,j 

dt 

dE 

dftj 

(12 a) 

dm; i 

dE 

(126) 

dt 

dvij ’ 

driij 

dt 

dE 

(12c) 

dhii 1 


where j and n it j are the internal state variables for the processing elements corresponding 
to the vertical and horizontal line processes; that is v itj — g{rn,ij) and /i,j = g{nij). it is 
easy to see that for this update the total energy will always decrease. 1 . The system will 
evolve in such a manner as to find a minimum of E. Our energy function E differs from the 
energy function chosen by Hopfield and Tank (1985) in two important aspects. Firstly, our 
energy function contains cubic terms (e.g. fij+i ■ fij ■v itj ) which implies quadratic terms in 
the update rule, different from Hopfield’s linear update rule. Secondly, our network consists 
of two networks: the first one corresponding to the inherently continuous surface depth and 
the second one associated with the inherently binary line processes. We will demonstrate 
empirically in the next section that this system seems to find next-to-optimal solutions to 
the surface reconstruction problem, even though no formal proof showing the convergence 
to the global minimum of E exists. 

'Note that the energy gain term E a makes a simple contribution proportional to -m,-,/ and 
to the right hand side of (12b) and (12c) and no contribution at all to (12a). 
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4. Simulation Results and Heuristics 


An analog network for a 32 by 32 image was simulated on a digital computer. Preliminary 
explorations of parameter values showed that the performance of the system depends on 
the relative weight of the different terms in the energy function. In particular, two sets of 
distinct parameters are important. First, the weight of the term describing the interaction 
among the line processes, E L , versus the weight of the smoothing term, in our case equal 
to 1, and secondly, the relative weight of the different components of E L and E c , /.e., 
cv,cl,cp,c c and c G . The first ratio ultimately determines the limiting depth gradient beyond 
which no more interpolation takes place. Decreasing the importance of the line interaction 
term E L versus the smoothing term encourages the formation of lines at smaller and smaller 
depth gradients. Thus, this number requires some rough estimate of the limiting depth 
gradient for which no smooth interpolation should occur. The second set of parameters 
determines to what extent adjacent parallel lines form, intersection of lines occur etc. We 
determined a parameter set giving reasonable solutions. Fortunately, the choice of these 
parameters does not seem to vary from image to image. Results in this paper refer to 
parameters set at c v — 0.5, c L — <1.0, c F — 5.0, c c = 1.0 and c G — 0.5. As boundary condition 
we choose to set horizontal lines at the two horizontal boundaries of the square image and 
vertical lines at the vertical boundaries. Thus, the image is effectively decoupled from the 
outside. As initial conditions we set the internal state variable of the line process neurons, 
mij and nij, to 0 and all horizontal and vertical lines to 0.5. In other words, the initial 
starting point is the middle of the hypercube [0,1]^, explicitly biasing no single position. 

The final state of the network should approximate as closely as possible the state of 
lowest energy. Since / and l are independent variables, the solution set can be found 
by minimizing E(f,l) for a given fixed arrangement of the line processes by varying /. 
These considerations dictate the following strategy. After initializing the depth lattice with 
the sparsely sampled depth data, the network computes the smoothest surface assuming 
all line processes set to 0. Thus, the initial state of the network is an everywhere smooth 
surface. This process converges in about two to three time constants. Subsequently, the 
depth network is updated ten times for every single update of the line process network. 
Functionally, this is equivalent to assuming that the depth network is stationary with regard 
to the line process network. In other words, the time constant of the depth network is a 
tenth of the time constant of the line process network r = C t -/2<. In the following, we will 
always refer to the elapsed time in terms of r. The time step for the differentiation was set 
to O.Olr. 

Figure 1 illustrates dramatically the difference between smooth and piecewise smooth 
surface interpolation. Note, that depth measurements are available only at every second 
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Figure 3. Temporal evolution of the states of the network for a sparsely sampled synthetic 
scene containing three slanting rectangles. Interpolation using a membrane-type energy function. 
On the average every third point is sampled. These measurements are corrupted by Gaussian noise 
(a —- 0.25). The top left figure shows the initial state of the network after smoothing. The following 
illustrations show the changing states of the network, clearly revealing three rectangles. Note, that 
in order to reconstruct such objects, there is a critical number of sampling points per object, below 
which the reconstruction yields very ambiguous results. Time is specified in terms of r. A variable 
coupling was assumed. 


point (on the average) along the contours marked with arrows. Thus, only about 5% of all 
points in the image are sampled. The results speak for themselves. 

How does the choice of X affect our results? For X >> 1, that is in the high-gain limit, the 
/,•’s will be almost always either 0 or t. Conversely, for X << 1, values of /,■ will be evenly 
distributed between 0 and 1. Experimentally, the only difference between a run with low or 
high gain is the convergence time. Runs with low X usually took much longer to converge, 
since the values of U were distributed within the hypercube [0, \] N and the first term in E tj 
explicitly penalizes these values. The final solution set appeared to be the same, no matter 
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what value of X was chosen. For all our simulations X = 16. Under these conditions, v itj 
and hij are almost always either 0 or 1. 

Figures 3 and 4 show more complicated synthetic images. To reconstruct the original, fully 
sampled, image as closely as possible, we introduced the following procedure. Most images 
contain, unlike figure 1, more than a single depth scale. That is, the depth gradient may 
vary greatly throughout an image, reflecting the fact that objects are located at different 
depths. As we discussed before, the relative weight of the line interaction energy E L to the 
interpolation term E s governs the scale at which no more smooth interpolation takes place 
and the surface breaks. Therefore, one way to scan for different depth values is to change 
the weight of E h during a simulation run. We multiplied E L by a factor iwhere K{t) 
starts out small — typically at 0.1 — and increases linearly until a given saturation threshold. 
In other words, initially the formation of lines is strongly penalized, encouraging a smooth 
interpolation everywhere except at very steep disparity gradients. Subsequently, by paying 
a smaller and smaller price for the formation of lines, the surface will break at smaller and 
smaller depth gradients. K(t) is bounded from above, since noise will otherwise lead to the 
creation of lines everywhere. The final state of the network is independent of the speed at 
which K(t) changes, as long as it increases slowly enough. Interestingly, increasing K(t) 
gives rise to new lines while older lines persisted. In no case did previous established lines 
fade away, although we are unable to prove this assertion. Note that the evolution of this 
nonlinear dynamical system can best be described as a process involving the minimization 
of an energy functional at each stage. This energy functional varies from stage to stage 
(see also Terzopoulos, 1985). 

5. Discussion 

The results we have presented indicate the plausibility of using graded networks of simple 
neuron-like processing elements to "solve" constraint satisfaction problems in early vision 
that can be formulated as minimization of a convex or a non-convex energy function. 
We have simulated on a digital computer a network for one particular well characterized 
problem, reconstructing surfaces given noisy and sparse depth measurements. Although 
this particular task has no "optimal" solution, since reconstructing a complex surface from 
insufficient data allows a potentially infinite number of solutions, our simulations indicate 
that the reconstructed image appears to be at least "very good". In particular, the solutions 
are similar to the solutions obtained using simulated annealing or other algorithms derived 
from estimation theory (Marroquin, 1984, 1985b). 

Many early vision problems can be formulated in terms of minimizing an energy function 
or as finding optimal Bayesian estimates. Although these methods are by no means the 
only approaches to early vision, they do offer advantages, including elegant ways of 
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Figure 4. Smooth and piecewise smooth surface reconstruction of a sparsely sampled synthetic 
scene containing both flat and curved surfaces. Smoother surface interpolation can be obtained 
by the use of a higher-order stabilizer, such as the thin plate stabilizer (Terzopoulos, 1985). The 
network is shown after 1.6r (for more details see figure 3). 

representing constraints, or a priori knowledge of the world. Vision problems which have 
been formulated in this way include surface interpolation (Grimson, 1981, 1982; Terzopoulos, 
1985; Marroquin, 1984), edge detection (Poggio, Vorhees, Yuille 1984), shape from shading 
(Horn and Brooks 1984), velocity field estimation (Horn and Schunck, 1981; Hildreth, 1984) 
and color (Hurlbert, 1985; Poggio et al., 1985). 

The single most important advantage of analog, parallel networks, independent of their 
implementation, is their speed. Typical convergence times are on the order of several 
system time constants. Thus, the convergence times will be of the order of 10 to 100ma 
for neuronal hardware and of the order of 10 to 100na for semiconductor circuits. An 
image similar to figure 3 but on a 128 by 128 grid yields similar convergence times. The 
convergence time does not depend per se on the size of the image array but rather on the 
size of the largest patch of smooth surface in the image. Since the quadratic smoothing 
term mimicks a diffusion, it takes on the order of n 2 time constants for information to 
propagate across the smooth patch, n pixels across. This behavior contrasts favorably 
with simulated annealing (Kirkpatrick et al., 1983; Hinton and Sejnowski, 1983). This latter 
technique, although guaranteed to converge asymptotically to the global minimum, is often 
very slow. 

It is interesting that our attempts to enhance the final solution by changing X during a 
single run did not have any apparent effect, except to increase the convergence time. In 
the Ising spin interpretation of Hopfield’s binary networks (Huang, 1963; Hopfield and Tank, 
1985) X can be interpreted as temperature T of the system and changing X during a run 
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Figure 5. Schematic diagram of a hybrid massive parallel machine for a real-time artificial vision 
system. The analog network — built using constant Ohmic resistances — minimizes a piecewise 
quadratic energy expression while the digital processors (indicated as solid squares) compute an 
arbitrary, nonlinear energy expression. The digital processors corresponding to the line process set 
or break the resistive connections in the analog network with the help of one or two transistors. 
The input data for the resistive network, positive currents, and the output of the network — positive 
voltages — are read in/out through the digital processors after suitable A/D or D/A conversion. Note 
that the spatial resolution of the analog network is half of the resolution of the digital network. This 
scheme combines the speed of a simple analog network with the versatility of a digital processor 
and would permit real time execution of vision algorithms. 

would be paramount to simulated annealing. 

The principal drawback of our method is that there is no guaranty that the network will 
converge to a state of lowest energy. Similar to the network solutions to the Travelling 
Salesman Problem (Hopfield and Tank, 1985), we can only show experimentally that 
the computed solutions seem reasonable compared with solutions obtained with other 
algorithms. Thus, it appears that our network seems to find a state of low — if not the 
lowest — energy. As Hopfield and Tank (1985) have pointed out, the main reason for this 
good behavior is the smoothing of the solution space upon transforming the problem from 
a discrete, binary space into a continuous one. 

The energy expression for the surface interpolation problem (equation 11a) contains, 
unlike Hopfield’s corresponding energy function (equation 9), cubic terms; that is, the 
corresponding update equation (12) contains a multiplication of two variables. A standard 
way of implementing a multiplication within analog microelectronic circuits is to change 
the conductance of a transistor in a voltage dependent manner (variable transconductance 
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multiplier; see for instance Grebene, 1972). As first proposed by Torre and Poggio (1978; 
see also Koch, Poggio and Torre, 1982) there is experimental evidence that a similar 
mechanism may be implemented within the dendritic trees of nerve cells. An inhibitory 
synapse with a reversal potential close or equal to the resting potential of the neuron (a 
so-called shunting or silent inhibition) interacts with an excitatory synapse in a multiplicative ' 
manner. :! ■' 1 

What are the implications of our results for the architecture of artificial vision systerhst 
How feasible is the construction of truly analog integrated network circuits? The network 
we have proposed for solving the surface reconstruction problem is a simple one, in as , 
far as the weights between the different processing elements are positive, global variables 
and do not depend on the location in the network. This is, unfortunately, not always to 
be expected. For instance, the resistances in the linear network Poggio and Koch (1985) 
have proposed for computing the smoothest velocity field (Hildreth, 1984) do depend on 
the location, making an integrated circuit implementation difficult. Both linear and nonlinear 
transfer functions can be built — within limits — using standard circuit technologies. The 
technology of choice would most likely be a variant of the MOS technology. A variable 

• T, - . ...... ,, . W» 

weighting term, such as K(t), would be difficult to implement within any degree of accuracy.' 

Purely analog networks do have one major drawback with regard to conventional 
programmable processors. Once a particular analog network has been build, it is difficult to 
change its parameters — such as the specific form of the penalizing terms — or to adapt 
the network to a different use. Thus, every task would require a dedicated analog network, ,, 
A second problem is handling the massive dataflow from and to the individual processors. 

A possible solution to this dilemma is a mixed, hybrid architecture, where full advantage 
is taken of the speed of analog circuit components and the versatility of programmable 
processors. 

The observation that a great number of early vision problems can be formulated as either 
minimizing a quadrational energy or as minimizing an energy containing a quadratic term 
(Poggio et al., 1985) leads to the following scheme for such a hybrid machine. Given a 
regular grid with a number of simple serial and programmable processors at every node, 
such as the Connection Machine currently under development at MIT and TMC (Hillis, 1984). 
Every processor is connected to its four neighbours (figure 5). Superimposed onto this 
Manhattan-like geometry is a regular network with constant Ohmic resistances, with half the 
spatial resolution of the digital network. The processors corresponding to the line elements 
have the ability to break the resistive connection between two neighbouring processors 
with the aid of a simple switch. The processors "above" the nodes of the linear network 
have at any time the option to read in/out from the analog network. 

The hybrid machine has two basic cycles. In its analog cycle, the processors inject 
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(positive) current — corresponding to the depth measurements — into the analog network. 
Subsequently, the resistive network finds the smoothest surface, given a certain distribution 
of lines — mimicked by the breaking of resistances between nodes. It is guaranteed to find 
the smoothest surface — for a given line distribution — since the network will converge 
to a state of least power dissipation, a quadratic functional (Poggio and Koch, 1985). 
Convergence is expected to be fast in comparison with the typical execution times of' 
simple operations in the processors. In the digital cycle, the processor network will read 
out the current voltage at every node in the resitive network — corresponding to thq, 
reconstructed depth value — and compute a new estimate for the now binary line process 
using conventionally programmed digital software. Subsequently, the processors will set or 
break the appropriate connections in the analog network. This hybrid system thus switches 
between the analog and the digital mode, essentially implementing our solution strategy. 
Changing the weight or the form of the energy expression can be done with the help of 
the digital processors, offering greater versatility than a purely analog system. Since the 
digital processors are "where the data is", such a scheme alleviates or strongly reduces 
the data flow problem. Note that the conversion of the data from digital to analog and vice 
versa implies delays between cycles, probably on the order of several microseconds if the 
accuracy of the depth variable is no more than 6 to 8 bits. 
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